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Abstract. As of today, object categorization algorithms are not able to achieve 
the level of robustness and generality necessary to work reliably in the real world. 
Even the most powerful convolutional neural network we can train fails to per¬ 
form satisfactorily when trained and tested on data from different databases. This 
issue, known as domain adaptation and/or dataset bias in the literature, is due 
to a distribution mismatch between data collections. Methods addressing it go 
from max-margin classifiers to learning how to modify the features and obtain 
a more robust representation. Recent work showed that by casting the problem 
into the image-to-class recognition framework, the domain adaptation problem 
is significantly alleviated ED- Here we follow this approach, and show how a 
very simple, learning free Naive Bayes Nearest Neighbor (NBNN)-based domain 
adaptation algorithm can significantly alleviate the distribution mismatch among 
source and target data, especially when the number of classes and the number of 
sources grow. Experiments on standard benchmarks used in the literature show 
that our approach (a) is competitive with the current state of the art on small scale 
problems, and (b) achieves the current state of the art as the number of classes 
and sources grows, with minimal computational requirements. 

Keywords: Naive Bayes Nearest Neighbor, domain adaptation, transfer learning 


1 Introduction 

In the last years the computer vision research community’s attention has been driven 
towards the existence of differences across predefined image datasets and the necessity 
to recompose these idiosyncrasies. The main reason behind this need is the increasing 
amount of available image data sources and the absence of a unique general learning 
method that can perform well across all of them. In practice training a classifier on a 
dataset (e.g. Flicker photos) and testing on another (e.g. images captured with a mo¬ 
bile phone) produces very poor results although the task (i.e. the set of depicted object 
categories) is the same. 

In this context the notion of domain already used in machine learning for speech 
and language processing has been extended to visual problems. A source domain ( S ) 
usually contains a large amount of labeled images, while a target domain (T) refers 
broadly to a dataset that is assumed to have different characteristics from the source, 
and few or no labeled samples. Formally we can say that two domains differ when for 


their probability distributions it holds Ps(x,y) 7 ^ Pr(x,y), where x £ X indicates 
the generic image sample and y £ y the corresponding class label. Specific annotator 
tendencies may influence the conditional distributions implying Pg(y\x) 7 ^ Pr(y\x). 
Other typical causes of visual domain shift include changing in the acquisition device, 
image resolution, lighting, background, viewpoint and post-processing ED. Most of 
these information are directly encoded in the descriptor space X chosen to represent the 
images and may induce a difference among the marginal distributions Ps(x) 7 ^ Pt{x). 

In 2013, Tommasi and Caputo showed that by casting the domain adaptation prob¬ 
lem into the Naive Bayes Nearest Neighbor framework one could achieve a very high 
level of generalization, thanks to the intrinsic properties of NBNN classifiers fT9ll . The 
proposed approach used distance metric learning to leverage over the source knowledge 
at the local patch level. This brought strong results in the semi-supervised and unsuper¬ 
vised domain adaptation scenarios, but the method is computationally expensive and 
thus not suitable to work on real-time systems, like smatphones or robots. 

Here we propose a simple, learning free domain adaptation method that makes it 
possible to exploit the generalization power of NBNN in the domain adaptation setting. 
We leverage over the source patches by randomly selecting a subset of them, and adding 
them to the target patches. To further increase the descriptive power of the descriptors, 
we perform data augmentation both on the source and the target data, as it is standard 
practice in the Convolutional Neural Network literature ETil . The combined effect of 
these two simple actions is remarkable: on commonly used benchmark databases, our 
approach is on par with the current state of the art when there is a single source from 
which to adapt, and when the number of classes is limited. In the more challenging 
and more realistic settings of multiple sources and increasing number of classes, our 
algorithm achieves the state of the art. 

The rest of the paper is organized as follows: after reviewing previous work (sec¬ 
tion^ we revise the basic definitions for domain adaptation (section [3TTj ) and the NBNN 
framework (section [3.2| i. Section[4]introduces our approach, while section[5]presents its 
thorough experimental evaluation. We conclude with a summary discussion and outlin¬ 
ing possible future avenues for research. 


2 Related Work 

The problem of domain adaptation stems from the fact that supervised learning methods 
fail to generalize across datasets Go). Although this problem exists in various applica¬ 
tions sum the visual recognition community has just recently shown interest in 
dealing with it 060 . Failure to generalize across datasets has been attributed to the 
mismatch among various characteristics of the considered databases, and is usually re¬ 
ferred to as the ‘dataset bias’ problem Da. The fact that different image datasets vary 
considerably in quality, point of view and image contents, reveals that addressing the 
domain adaptation problem can significantly improve the performance of visual recog¬ 
nition applications. 

Several approaches have been adopted for reducing the distance between datasets. 
These approaches vary from transferring source data to the target domain 0 or trans¬ 
ferring both source and target to a third space0. Unfortunately, despite all efforts, lfT2l 






showed that currently existing selective transfers do not offer significant improvement 
over random transfers. As an alternative to the enrichment of the target data through in¬ 
stance based transfer from the source, attempts have been made to modify the classifier 
in order to resolve the mismatch problem 111 11151141161 during training. 

While the image to image paradigm is the dominant approach in the above-mentioned 
methods, B17I181 suggested that one can replace NBNN for Bag Of Words (BOW) com¬ 
bined with an image-to-image classification paradigm, in favor of an image-to-class 
recognition framework. This idea helps release the domain transfer from the known 
shortcomings of BOW representations (TJ. 

Even though NBNN has been tested on several visual learning applications, the use 
of this classification paradigm in domain adaptation has been limited. Only in 2013 ca 
exploited its potential in a metric learning approach, and showed that using NBNN, one 
can easily surpass the state of the art among BOW-based algorithms presented so far. 
Be that as it may, the possible usages of this method, called DA-NBNN, are restricted 
due to the computational complexity. Indeed, once the amount of classes, the number 
of sources and the number of data for each class and source grow, using DA-NBNN 
becomes computationally prohibitive. Our approach overcomes these computational 
limitations while preserving, and often significantly surpassing, the performances of 
DA-NBNN, proposing the first learning free NBNN-based domain adaptation method 
in the literature. 

3 Problem Setting and Definitions 

In this section we set the scene by introducing formal definitions for the domain adap¬ 
tation problem (section and the NBNN classification framework |3. 2 1 The notion 
introduced in this section will then be used to present our algorithm. 

3.1 Domain Adaptation 

Domain Adaptation is the problem where knowledge from the source domain V s is used 
to enrich and hence improve the performance in the target domain V 1 . This knowledge 
from the souce might be in the form of instances or data, or model parameters, or metric 
induced by the source. It is usually implicitly assumed that the labeled data on the target 
domain does not exist (unsupervised setting) or it is scarce (semi-supervised setting). 
Although the source and the target domains are different, they use equal label sets ED 

y s = y f . 

The core cause of mismatch between the two domains is attributed to the difference in 
the distribution of these labels. The conditional probability of labels given features are 
not completely coincident P S {Y\X) ~ P t (Y\X) and the marginal data distributions 
are not equal either P S (X ) / P l {X). In this paper, we will focus exclusively on the 
semi-supervised setting. 

3.2 Naive Bayes Nearest Neighbor 

In the Naive Bayes Nearest Neighbor (NBNN) classification framework, it is assumed 
that for each class there exists a distribution from which local descriptors are drawn 
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Fig. 1: An overview of our proposed learning free, NBNN-based domain adaptation 
approach for the class‘cow’: after performing data augmentation on both the source 
and target data, patches-based features are extracted from both, and a new target data 
set is created by merging the whole patches-based features extracted from the target 
with a fraction of those of the source, randomly selected from the whole sample data. 
This new pool of patches-based features is then used to build an NBNN classifier in the 
target domain. 


independently of one another. This leads to the use of a Naive Bayes maximum a 
posteriori classifier (TJ where each feature m votes for one of the classes in c = 
{1, This voting is realized using the local distance between each feature and 

its nearest neighbor in class c. Df2C(m,c) = \\f m — /£J|. The generalization of 
this distance concept to image to class distance is straightforward:D/2C'(f 7 ’i, c) = 
= lMiDf2C(m, c ). 

The output of the classifier would then be 

p = argmmDi2C(Fi,c) ( 1 ) 

C 

The distance to this optimum class p is called the positive distance while the distances 
to the rest of the classes n : {c^p}, D/2c(Fi, n) are called the negative distances. 

4 Learning free NBNN-based domain adaptation 

As outlined above, the problem of domain adaptation emerges when the training data 
for the target task is scarce. Should it not be the case, any supervised learning algorithm 
would be capable of learning a classifier, according to its learning abilities. It is also 



assumed that there exists at least another dataset with enough samples to learn a good 
classifier (the source), but since the two datasets have been acquired in two different 
domains, the performance obtained training on the source and testing on the target is 
not satisfactory. 

The NBNN algorithm builds support sets for each class made of the collection of all 
the features extracted from patches of each of the training examples. Due to the scarcity 
of the data on the target, the support sets that can be built solely using features from 
the target samples will not contain enough features to guarantee a solid performance. 
In order to enrich these support sets, our proposal is to use features extracted from the 
patches of the source images. 

How to select such patches-based features? In Ifl2l , the authors investigate a domain 
adaptation approach based on the idea of landmark samples from the source domain, 
which are relevant for the modeling of the target classifier. Although their approach 
is theoretically sound, experiments show that the learning method proposed to select 
such landmark is often statistically on par, and otherwise within a two percent range 
of performance, with a random selection of the learning samples. Motivated by this 
result, we apply the same philosophy here to the patches-based features, and we propose 
to achieve domain adaptation in an NBNN-based framework by randomly sampling a 
percentage of the patches-based features from the source, adding them to the patches- 
based features of the target. We will show with experiments in the next section that 
this extremely simple and learning free strategy achieves amazingly good results on 
standard domain adaptation benchmark databases, while being reasonably stable with 
respect to the amount of features to be samples. 

To further improve performance in our approach, we have tested the effect of per¬ 
forming data augmentation on the source and target data. Data augmentation is a tech¬ 
nique that, since the spectacular success of convolutional neural network in the visual 
classification arena, has been shown to be very effective in general for any classification 
algorithm E3). Again, our experiments confirm the effectiveness of this strategy, even 
more so combined with the instance-based domain adaptation approach based on ran¬ 
dom sampling of patches-based features from the source. A schematic representation of 
the overall approach for the class ‘cow’ is given in figure [4] Note that adding the data 
augmentation step to our overall approach does not significantly increase the almost 
non-existent computational load in training. This characteristic, combined with the re¬ 
markably good performances achieved especially as the number of classes and sources 
grow, makes our approach potentially attractive for applications where computational 
complexity should be low, like mobile robot or online, wearable systems. To the best of 
our knowledge, there are no previous instance-based, NBNN-based domain adaptation 
methods in the literature, nor the random sampling strategy has been ever tested in the 
NBNN learning framework for any learning to learn approach. 

5 Experiments 

In this section we describe the experiments we performed to assess our approach. We 

then we report 



5.1 Datasets, Features and setup 


Datasets We used the Office dataset, the standard test bed in domain adaptation which 
addresses the problem of object categorization between any two datasets of objects usu¬ 
ally found in offices 0. This test bed consists of three domains namely Amazon, We¬ 
bcam and Dslr. The Amazon dataset contains images obtained from online merchants. 
The images are centered and usually on a white background. Webcam and Dslr are re¬ 
spectively low resolution and high resolution images obtained from web cam and SLR 
cameras. Unlike Amazon, they could be subject to various environmental disturbances 
such as lighting or background changes. The Office dataset contain 31 classes of images 
for each domain. 

Having chosen 10 of the original 31 classes from office, fl3l suggested that we can add 
images of the same 10 classes from Caltech-256 m and form the Office+Caltech test 
bed in order to add a fourth domain in the office dataset. 


Features Following the protocol of ED, images were all resized to a common width 
(256px) and then converted to grayscale. SURF features were extracted according to 
8231 . The final result was a set of features of length 64 that were consequently fed to a 
1 -nearest neighbor classifier. 

The effect of data augmentation on both domains has also been studied. To this end, we 
have duplicated the exact procedure suggested in EQ and each image is converted into 
10 images through the procedure of cropping and flipping. 


Setup Different pairs of datasets are chosen to act as the source and the target from the 
Office+Caltech group. From the source dataset, 20 images were selected to represent 
the source data but only 3 were chosen from the target in every class. When the target 
was Webcam, 15 images were selected instead of 20 as described in m. At this stage, 
since the Dslr dataset behaves very similarly to Webcam and it contains a lower number 
of images, we decided not to include it in our benchmarking. 

The same sample selection protocol has been adopted for the 31 class adaptation exper¬ 
iments. The third setup that we considered is domain adaptation from more than one 
source with one target. To this end, all possible combinations of two sources to one 
target have been examined and benchmarked against the existing reported results in the 
literature. 

5.2 Results 

The first set of experiments was done on a subset of Office+Caltech consisting of 10 
classes as explained in |fl9l . Figure[2]shows the results in comparison to the state of the 
art and some baseline algorithms. 

Figures [2b] [2c] and [2d| show the changes in the recognition rate with the increase of 
the percentage of descriptors, randomly transferred to the target from the source. For 


Fig. 2: Results for the 10 class experiments. Figure[2a]shows the overall results obtained 
by our method compared against state of the art algorithms. Figure[2b]shows the change 
in recognition rate on the Amazon-Caltech experiment of our method as the percentage 
of source data transferred to the target set increases, for the cases no augmentation, only 
source data augmentation, only target data augmentation and both source and target data 
augmentation. Analogous results are shown in figure [2c]and figure [2d]for the Webcam- 
Amazon and Caltech-Webcam settings respectively. 


(a) 10-class experiments, Overall 


(b) A-C Results 




(c) W-A Results 


(d) C-W Results 


C-W 



a better understanding of the effects of different factors, four cases have been demon¬ 
strated together. Original data is where there is no augmentation done neither on the 
target nor on the source domains. The cases where only the source and only the target 
domains have been augmented are referred to as Source augmented and Target Aug¬ 
mented respectively. Source and Target Augmented is where both domains have been 
over-sampled. 

The second set of experiments is done on the 31 class Office dataset. The experi¬ 
ments are done exactly inline with what explained and done in |fl9l . Table [T] shows the 
results with comparison to the state of the art both using NBNN and the state of the 
art based on a method other than NBNN. Some further baselines are also included for 
















































































better comparison. 


Algorithm 

A - 


W 

W - 


D 

D - 


W 

BOW 

34.9 

± 

0.6 

48.9 

± 

0.5 

38.4 

± 

0.4 

GFK 

46.4 

± 

0.5 

66.3 

± 

0.4 

61.3 

± 

0.4 

NBNN 

40.0 

± 

2.0 

67.2 

± 

2.5 

70.7 

± 

1.2 

I2CDML 

47.9 

± 

1.3 

72.8 

± 

2.1 

73.8 

± 

1.6 

H - L2L{hp - /3) 

76.2 

± 

0.02 

67.8 

± 

3.05 

66.0 

± 

3.01 

DA-NBNN 

52.8 

± 

3.7 

76.2 

± 

2.5 

76.6 

± 

1.7 

OURS 

55.0 

± 

3.3 

77.5 

± 

2.0 

78.2 

± 

1.4 


Table 1:31 class Office dataset experiments, semi-supervised setting 


The Third and last set of experiments are those run using more than one source do¬ 
main. The Results can be seen in Figure[3] Not all Algorithms can be extended to cover 
the case of several sources and so only those who had this advantage were included in 
the comparison. For the experiments the exact test set of ||20l has been used. 



Fig. 3: Accuracy on target domains with multiple sources (A:Amazon, W:Webcam, 
D:DSLR), 31 class, semi-supervised 






















5.3 Discussion 


The biggest advantage of our proposed method is its simplicity combined with its strong 
performance over growing number of classes and source domains. It also performs sur¬ 
prisingly well in comparison to other algorithms. The results in Figure[2]show that while 
different algorithms have varying performances on various test settings, our method is 
never worse than the second best. In particular, compared to DA-NBNN jT9l (which 
is the state of the art among all the methods that exploit an NBNN approach), our 
method outperforms it in 2 cases (A-W and C-W), while DA-NBNN performs better in 
two cases (C-A and A-C). In the remaining two cases (W-A, W-C) their performance 
is close. In fact, the p test shows that in these two experiments there is no statistical 
evidence of superiority for either of the algorithms. 

Our method performs significantly better than L2L BOl where L2L is the state of 
the art among methods that do not use NBNN. In four of the experiments, L2L achieves 
inferior results than ours, while only in one setting shows superiority. Note that the ac¬ 
curacy values reported for L2L have been taken from l20l . where no result was reported 
for the C-W experiment. 

Using the 31 class Office setting, one can study and compare the scalability of the 
algorithms with respect to the number of classes. Addressing this type of scalability 
for our method appears very straightforward. The fact that there is no training, makes 
things very easy and faster. Table [T] shows that, performance-wise, our method scores 
higher than DA-NBNN in all three experiments and better than L2L in two out of three 
cases. 

Figure[3]Compares the recognition rate for all possible combinations of two sources 
and one target in the Office dataset. For DA-NBNN it is not clear how it could be 
extended to this case and no experiments of the kind have been reported by its authors. 
L2L supports this case and it has been included in the benchmark. It can be seen that 
our method outperforms all the others for all three cases of experiments. 

An open issue in our method is of course which percentage of the source data should 
be randomly selected and then added to the target data, in relation to the data augmenta¬ 
tion procedure. Results shown in figures [2b]|2d| show that in general the combination of 
source plus target data augmentation and random sampling of around 20% of patches- 
based features from the source seems to achieve strong performance, always better than 
the original data. Still, as it can be seen from the figures, the actual optimal performance 
might vary in terms of percentage of sampling and/or data augmentation strategy for dif¬ 
ferent settings. Although accuracy results are on average quite stable, and therefore the 
algorithm could be used in online systems even in its current form with good expecta¬ 
tions about performance, it would be desirable to explore further the issue of the data 
selection and find principled ways of selecting the patches to transfer from the source 
to the target so to have guarantees about the optimality of the procedure. Of course, 
that would come at the expenses of the current negligible computational cost of the 
approach. 




6 Conclusions 


The contribution of this paper is a learning free Naive Bayes Nearest Neighbor based 
domain adaptation method that is competitive with the current state of the art on the 
standard Office-Clatech benchmark database, and that achieves the state of the art when 
the number of classes and sources grows. The method consists in performing a random 
selection of patches-based local features from the source to the target, combined with a 
data augmentation strategy mutated from the CNN literature. The resulting algorithm is 
extremely simple but also remarkably effective, especially when the number of classes 
and sources grows. An open challenge is how to select the best percentage of source 
data to add to the target: even though our experimental evaluation indicates that as a 
rule of thumb sampling around twenty percent of the overall sample data (i.e. after data 
augmentation) in general leads to very good results, future work will focus on how 
to determine how much to sample in a principled manner, while at the same time not 
increasing excessively the computational cost of the approach. 
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